arXiv:1507.02372vl [cs.DC] 9Jul2015 


Requests Prediction in Cloud with a Cyclic Window 

Learning Algorithm 


Min Sang Yoon^, Ahmed E. Kamal^, and Zhengyuan Zhu^ 
^Department of Electrical and Computer Engineering 
^Department of Statistics 
Iowa State University, I A, 50010 
{my222^, kamal^,zhuz^} @iastate.edu 


Abstract —Automatic resource scaling is one advantage of 
Cloud systems. Cloud systems are able to scale the number 
of physical machines depending on user requests. Therefore, 
accurate request prediction brings a great improvement in 
Cloud systems’ performance. If we can make accurate requests 
prediction, the appropriate number of physical machines that 
can accommodate predicted amount of requests can be acti¬ 
vated and Cloud systems will save more energy by preventing 
excessive activation of physical machines. Also, Cloud systems 
can implement advanced load distribution with accurate requests 
prediction. We propose an algorithm that predicts a probability 
distribution parameters of requests for each time interval. Maxi¬ 
mum Likelihood Estimation (MLE) and Local Linear Regression 
(LLR) are used to implement this algorithm. An evaluation of the 
proposed algorithm is performed with the Google cluster-trace 
data. The prediction is implemented about the number of task 
arrivals, CPU requests, and memory requests. Then the accuracy 
of prediction is measured with Mean Absolute Percentage Error 
(MAPE). 

1. Introduction 

Predicting users’ requests has been playing an important 
role in operating network systems because the accurate predic¬ 
tion can prevent systems from wasting operational cost and op¬ 
timize resource utilization. Resource utilization is an especially 
important issue in Cloud systems. One advantage of Cloud 
systems is automatic scaling and management. Therefore, the 
accurate prediction of the requests in the Cloud systems will 
bring improvement in operating systems with more advanced 
resource management techniques. 

Cloud systems consists of a lot of computing and network 
devices. Thus, activating the appropriate number of devices 
based on the requests is an important issue in terms of saving 
energy. Data centers consumed 91 billion kilowatt-hours of 
electricity in 2013 and 50 percent of that is wasted due to lack 
of awareness of the traffic according to [1]. This data shows 
how the accurate prediction of the requests will be increasingly 
important in the future with an indisputable increase in energy 
consumption of data centers. Also, precise requests prediction 
has an effect on the performance of Cloud systems. If we 
inactivate overfull computing or network devices for energy 
saving purpose, it will cause a bottleneck and delay in handling 
requests. 

There are many research that propose predicting an ex¬ 
pected value of the requests with several methods. However, 
the prediction of the quantified number of requests is not 
the proper strategy for a stable operation of Cloud systems 
because user requests have heavy fiuctuation frequently. Thus, 
we propose a strategy to predict parameters of the probability 
distribution of the requests in every regular interval instead 
of predicting the actual amount of requests. An advantage of 
predicting the probability distribution is that we can make the 
more fiexible prediction. There are always request bursts in 


networks. If the system predicts a certain amount of requests 
based on the average of past requests. Cloud systems cannot 
prepare immediate variations of requests. However, if we are 
aware of the probability distribution of the requests during 
a given period, we are able to prevent request bursts by 
predicting the requests which corresponds to a high probability 
in the predicted probability distribution. 

Eor the prediction, we collect the history requests data of 
the Cloud and observe the histogram of the data. Based on 
the histogram analysis of the data, we decide the probability 
distribution model fitted to the collected data. Then, we induce 
parameters of the probability distribution model with Maxi¬ 
mum Likelihood Estimation (MLE) and save the parameter 
data to the prediction dataset. After we accumulate enough data 
for the prediction, the prediction model estimates parameters 
of the probability distribution with Local Linear Regression 
(LLR) by using a cyclic window approach. Parameters of 
the probability distribution are the time-dependent data, which 
means the parameters are changed as time goes by. So the 
parameters have obvious patterns during every interval. It will 
reveal the same pattern during a certain period in every day or 
every week. So our prediction model will make the prediction 
by using the accumulated dataset at every same time point 
through several periods. In order to refiect a change in trend 
of the requests, the prediction model maintains the dataset by 
replacing an old parameter data with the recent data. 

We use Google cluster-trace data to test our prediction 
model, which is real measurements of usage requests in Google 
cluster. One week data is employed for a training data and 
the following week data is used for testing in order to verify 
whether the prediction model is accurate. We selected random 
time point to start collecting data so that the prediction model 
can be tested in more practical environment. There are three 
types of data for prediction, the number of task arrivals, CPU 
requests, and memory requests. Since various user requests are 
predicted and have a large variance in its values, we use Mean 
Absolute Percentage Error (MAPE) to normalize the error rate 
and assess the prediction accuracy. 

The rest of the paper is organized as follows. Previous 
works are reviewed in Section II. We will introduce mathe¬ 
matical methods and the prediction algorithm in Section III. 
A system model for an experiment will be demonstrated in 
Section IV and the experiment results will be shown in Section 
V. The conclusions are given in Section VI. 

H. Related work 

Requests prediction in Cloud system is accomplished by 
many researchers for the automatic scaling of systems. 

Akindele A. Bankole et al. employ machine learning tech¬ 
nique for a predictive resource provisioning in Cloud [10]. 
Their prediction model is achieved with machine learning 



techniques: Neural Network, Linear Regression, and Support 
Vector Machine. They predict the CPU utilization, response 
time, and throughput based on collected data from virtual 
machines web servers and database servers. The prediction 
model generates prediction values in every given minute with 
machine learning techniques and measure error rate with 
MAPE and Root Mean Squared Error (RMSE). However, the 
prediction model did not show the high prediction accuracy. 
Their results show 24% prediction error at a certain point of 
time in predicting the CPU utilization and a 21% error in the 
response time prediction. 

Sedeka Islam et al. present more advanced machine learn¬ 
ing techniques for predicting resource usage [5]. The error 
correction Neural Network (ECNN) and the Linear Regression 
techniques are employed for prediction. They included a slid¬ 
ing window method to reflect the current state of the system. 
They have generated prediction values based on window sizes 
and evaluated them with MAPE, PRED(25), RMSE, and 
i?^prediction accuracy. The CPU utilization data is collected 
from the Amazon EC2 cloud through the TPC-W benchmark 
and prediction values are generated with the ECNN and Linear 
Regression method. The prediction values of CPU utilization 
has around 19% error rate without the sliding window and 
has minimum 18.6% error rate when they employ the sliding 
window. 

Many statistical approaches are also applied to the pre¬ 
diction in Cloud. Bruno Lopes Dalmazo et al. propose the 
traffic prediction approach based on the statistical model where 
observations are weighted with Poisson distribution inside a 
moving window [9]. They consider the past information by 
means of a sliding window of size A and this window is 
applied by weighting the past observations according to the 
Poisson distribution with parameter A. Dropbox trace data is 
employed for testing their prediction model and Normalized 
Mean Square Error (NMSE) evaluation method is utilized for 
the error measurement. The prediction model could achieve 
NMSE values between 0.044 and 0.112. The prediction is 
ideal when NMSE value is equal to zero and worse when it is 
greater than one. They could achieve the reasonably accurate 
prediction with this approach. 

They also suggest the traffic prediction model with a 
dynamic window approach [8]. The sliding window size is 
changed based on variance of the previous window size. The 
small variance indicates the predicted data is close to the 
meanwhile the high variance means the predicted data is spread 
out from the mean. So they update the window size in every 
prediction interval by considering the size of the variance in 
the previous prediction. The prediction accuracy is improved 
from 7.28% to 495.51% compared to the previous statistical 
model. 

III. Algorithm 

The prediction model estimates parameters of the probabil¬ 
ity distribution of future user requests in every predetermined 
period. We make the assumption that user requests have 
obvious patterns and the patterns are repeated periodically. Eor 
example, the request pattern of all Mondays will be a similar. 
Hence, the prediction model adopts the history parameter data 
at the same time point in order to make a prediction about the 
future. 

We introduce three time scales for the prediction periods: 
Pattern Period (PP), Target Period (TP) and Utilization 
Period (UP). The PP is a cyclic interval that exhibits pattern 
repetition. The TP is a unit duration for which we want to 
make a prediction. The PP is a cyclic window that we use for 


predicting the activities in TP. In the example, when we intend 
to predict the request distribution during a certain Monday by 
assuming the same pattern is repeated in every week. Then, 
we can set the TP to a day and the PP to a week because 
we assume the pattern of a day is repeated every week. If we 
only use the past Mondays' data for the prediction, the UP 
becomes one day. 

Since we assume patterns are repeated on the every PP, 
any time duration that we want to make a prediction cor¬ 
responds to a certain TP on the PP. Therefore, we can 
predict the request distribution during any time interval by 
corresponding them to a certain TP on the PP. Eor the precise 
prediction, we accumulate data for several PPs. Although 
patterns of the traffic distributions will be similar on the same 
time point in every PP, the traffic amount will be different. 
In other words, we can say the distribution of the traffic shows 
similar form in every Monday, but we cannot ensure that 
the amounts of traffic will be same. Therefore, the prediction 
model can achieve higher prediction accuracy by accumulating 
data during several PPs. 



Eig. 1: Prediction dataset 


To implement prediction, the prediction dataset saves the 
past data in a m x / matrix, m represents the number of TPs on 
the PP and I denotes the number of PPs we accumulate. In 
Eigure 3, each vertical block corresponds to saved parameters 
of the probability distribution in each TP during a PP. We 
start to stack the data from the first block of the first iteration. 
If we reach the TP^’s block, which is the last TP, we move to 
the second iteration and stack the data from the first block of 
the second iteration, which means we have saved data during 
a PP. When the matrix is full, we go back to the first block 
of the first iteration and replace the old data to reflect the 
tendency of recent requests. 

Any time duration that we want to predict the traffic 
distribution for can be related to a certain TP on the matrix. 
In order to make a prediction, we employ the UP data. In 
Eigure 3, we can see that distribution parameters of the TPm 
can be predicted by using the UPm- We can set the size of 
the UP depending on how many previous TPs will affect to 
the state of the current TP. Eor example, if we set the PP to 
two days, we can say previous Sunday and Monday's history 
parameters will affect to next Monday's traffic distribution as 
we can see in Eig. 3. 

In this paper, we forecast the number of task arrivals during 
each target period. The first step of prediction, the prediction 
model constructs a histogram of user requests in every target 
period to observe distributions of requests. Then, it fits a 
probability distribution to the distribution of requests. When 
the probability distribution kernel is decided for fitting, we 
will adopt MLE in order to obtain parameters of the probability 
distribution in every observed period and the parameters are 
saved on the dataset. After the dataset accumulates enough data 
for prediction, it is able to predict parameters of a following 























target period. Local Linear Regression (LLR) will be employed 
to predict parameters of the future requests. 

A. Histogram 

A histogram is the graphical representation of the dis¬ 
tribution of data. We can observe frequencies and overall 
distribution of given data through a graphical representation. 

Histograms of requests are constructed in every regular 
interval to observe the distribution of requests. After observe 
the histograms, the prediction model decides which of the 
probability distribution model will be the closest to the actual 
distribution of requests. 

B. Maximum Likelihood Estimation (MLE) 

MLE estimates parameters of a probability distribution 
when there are data corresponding to the probability distri¬ 
bution model. Our prediction model employ the Poisson dis¬ 
tribution based on the histogram observation of the experiment 
data. Depending on accumulated request data, the prediction 
model will induce the Poisson distribution parameter by using 
MLE in every TP. 

Poisson distribution has the only parameter A. Since the 
prediction model has data through observation, the number 
of task arrivals, CPU, and memory request, it is able to 
induce parameter A by using MLE method. If we observe 
n independent datasets XI, X2, X3, ..., Xn iid Poisson 
random variables, maximum likelihood function L(A) will be: 


L(A) = 




X2 X, 
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If we take log in the equation, log likelihood function becomes: 

n 

l{X) = Y,{Xi\ogX-X-\ogXil) 
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We find maximum of A by finding the derivative of equation: 

l'{X) = \j2x^-n = 0 (3) 


i=l 


, which implies the A that has closest distribution with observed 
histogram is: 




( 4 ) 


C. Local Linear Regression (LLR) 

LLR is one of the kernel smoother techniques for esti¬ 
mating a real value function, when no parametric model for 
this function is known. LLR combines much of the simplicity 
of linear least square regressions by fitting the line about the 
given k number of points with the N number of observed 
points. In the prediction model, the k is corresponded to the 
number of TPs on UPs and N is equivalent to the number of 
history parameters we will employ for the prediction, which 
is called to bandwidth. After fitting the line at every given 
point, the estimation functions Y{TPk) are achieved as a value 
function with the k numbers of values. Khx{Xu^Xi) be a 
kernel defined by: 


( 5 ) 


The D() is a positive real valued function in (5), which is 
decreasing when the distance between Xi and Xu increases. 
The Xu is the given points and the Xi is one of the observed 
data around Xu. Commonly used kernels include the Epanech- 
nikov, biweigh and Gaussian function. Eor one dimension data, 
least-square method is employed for obtaining function value 
on the Xu. 
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min y] KnxiXu, X,) (r(Xi) -a(X„) -^(X„)Xi) 

1=1 

( 6 ) 

The N is the number of history parameter near Xu, that we 
will employ in (6). Since we obtain parameters in each TPi 
by using MLE, the minimum of a{Xu) and (3(Xu) can be 
achieved by solving the weighted least square problem (6). 
If we assume the estimation function on Xu is Y{Xu) = 
a{Xu) + f3{Xu)Xu, the closed form solution of the estimation 
function is like: 

f(X„) = (1, X^)iB^WiX^)B)-^B^W{X^)y (7) 

where: 

y = iYiX,),...,YiXN)f (8) 

W{Xo)=diag(Khx{Xu,Xi)) (9) 

V / NxN 

A ::: A) 

By repeating this process about all given k points, Xu, we can 
get real value estimation functions Y{Xu) about k points. 


D. Cyclic Window Learning Algorithm 

We describe the algorithm that predicts parameters of the 
probability distribution of the future target periods by using a 
cyclic window approach. We assume the dataset has enough 
past data for the prediction and determined a probability 
distribution model for MLE. The algorithm will employ LLR 
to predict the probability distribution parameters of the future 
target periods and MLE for updating dataset. 

The Algorithm 1 obtains the predicted parameters of target 
period (PT) for the prediction and actual parameters of target 
period (AT) to update dataset at every time period. The 
m means the number of TPs included in a UP, which is 
equivalent to a window size. The n is the number of TPs 
during a PP. The I represents how many cycles of PPs will 
be stacked on the prediction dataset (PData). The PData is 
the prediction dataset has the m x / dimension. The TPdatat 
means observed requests during an interval of the TP at time 
t. The prediction model will obtain parameters of the TP dX 
time t with MLE by using this data, TPdatat. 

We initialize variables: t, p, and w (line 1). The t represents 
how many unit periods are passed after the algorithm starts and 
the p is a corresponding position of the TP about time t. Since 
we return to the initial position on the PP when the p reaches 
the end of the PP, the p becomes a cyclic number from 1 to 
m. The u; is a row position on the m x I PData. Too much 
data requires the complexity of the prediction and consumes 
too much time for the prediction. Therefore, we stack only 
the appropriate number of cycles on PData by replacing the 
old data. The w is which row position in the PData will be 
updated. 

Eirst, the algorithm collects the data for the prediction from 
the PData to the UTt (line 2 — line 7). If the position p is 




Algorithm 1 Cyclic Window Learning Algorithm 

Require: PDatamxuTPdatat, m, n, and I 
Ensure: PTt and ATt 
1: t = l,j9 = 1, and w = 1 
2: while System operate do 
3: if p < n then 

4: UPf — PDcitCi(^^_Yi—p)xl • P 

PDataixi • PDatapxiy'^l 

5: else 

6 : UPf — PD(lt(l(^p_Yi)xl • PDOjtOjpxii^l 

7: end if 

8: Implement LLR in terms of UPt 

9 : PPt = Y(UPp) and select kth value 

10: Update databased with actual parameter 

11 : APp = MLE{PPdatat) 

12: Update databased with APp 

13 : PData{p^ w) = APp 

14: f f -j- 1 

15: if p < m then 

16: p=p+l 

17: else 

18: p = 1 

19: if w < I then 

20: W = W 1 

21 : else 

22: W = 1 

23 : end if 

24: end if 

25: end while 


less than the window size n, we need to employ the data from 
the end of the PData because the p is cyclic. So we collect 
the data from m — n—pth to rrith columns’ data and from the 
first to Pth columns of PData (line 3 — line 4). If the p is 
greater than the window size n, we collect previous n columns 
data from the point p on the PData (line 5 — line 7). 

We implement LLR about the collected data, UPt, and 
obtain the prediction value about time t (line 8 — line 9). 
In order to update the PData, we need to obtain the actual 
probability distribution parameters of observed requests. We 
apply MLE about the data PPdatat and update corresponding 
data block in PData (line 10 — line 13). We update the 
position on the PData for next prediction. We update the t for 
predicting next time point (line 14). We just increase the p if 
the position of the p is still on PP. If the p exceeds the size of 
PP, m, it goes back to the initial position of the PP. We also 
update the w when p goes back to the initial position because 
that means one row of PData is filled with new data. The w 
increases when the p increases. However, the w becomes one 
if the w exceeds I, which is vertical size of the matrix PData 
(line 15 — line 23). 

IV. System model 

A. Google Cluster Data collection (Arriving Pasks, CPU, and 
Memory) 

The experiment is implemented with Google cluster-usage 
traces data [14]. Google cluster is a set of computing resources 
composed of thousands of machines. A job is composed of 
several tasks which can be processed separately. So each task 
will be a unit of a process. We consider the number of task 
arrivals, CPU requests, and Memory requests of tasks. Each 
task has a timestamp which represents when the task arrives 


at the cluster. Therefore, the distribution of the number of task 
arrivals can be observed by using the timestamp. The cluster 
data also contains the CPU and memory requests of each task. 
The CPU requests show core counts or core-seconds/second of 
tasks and the memory requests represent how much bytes each 
task requires. The cluster starts measurement 600 seconds after 
the system is operated and has accumulated data for one month 
approximately. We select a random point to collect data for 
the prediction model. One weak data is sampled as a training 
dataset for the prediction modeling and the following week 
data is employed to test the accuracy of the prediction model. 

V. Experiment 

A. Patterns of data 

All experiments are conducted by Matlab. Basically, user 
requests have a regular pattern. Requests increases during the 
daytime and weekdays more than the nighttime and weekend. 
We could observe some patterns by analyzing the Google trace 
data. The start time of trace is randomly chosen in data. So 
we do not know what the exact date or time of measurement 
points. However, our assumption is arriving tasks will show 
regular patterns in every interval. 



Eig. 2: Daily Pattern Analysis 


Eigure 2 shows patterns of task arrivals during a week. The 
number of task arrivals is counted for every one hour, so we 
could observe how many tasks arrive at the cluster in every 
hour during the week. The hourly pattern shows high peaks 
for the first 4 hours and the last 10 hours. So we can assume 
daytime starts around the tenth hours from the measurement 
because the number of tasks is increased from the tenth hour 
and end at the fourth hour in Eigure 2. If we see the daily 
pattern, the first two days and the last two days show higher 
requests. In the same way, we can assume weekdays start on 
the fourth day from the measurement by assuming requests 
increases during weekdays rather than the weekend. 

The daily pattern analysis of task arrivals presents obvious 
patterns of the incoming requests. Therefore, the cyclic data 
collection should be an effective approach for the prediction. 

The daily pattern analysis of the CPU and memory requests 
also have demonstrated the similar patterns with task arrivals. 
The CPU and memory requests have had high requests when 
the number of task arrivals increases and had low requests 
during free periods. 

According to our observation, we have decided to set the 
PP to one week because we could observe hourly and daily 
patterns of requests repeatedly during every week. 


















B. Histogram analysis of data 

According to the experiment, we found that too short TP 
is not enough to observe apparent patterns of distribution of 
requests. Thus, the TP is set to 30 minutes based on em¬ 
pirical observation. The prediction model predicts distribution 
parameters in every 30 minutes. 
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Fig. 3: The number of tasks during the day 


As patterns are observed in Figure 2, the number of task 
arrivals show an apparent pattern depending on the time. Figure 
3 represents the number of task arrivals in every 30 minute 
during the first day. For the first 4 hours and the last 10 hours. 
Figure 3 presents high rate of incoming tasks. 

Histograms of each target period are different depending on 
the total number of task arrivals. Histograms have high peaks 
in the more right side during the busy hours and they have 
high peaks in the more left side during the free hours. 
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Fig. 4: Distribution of the first day 


Figure 4 is a histogram of the first hour. The first histogram 
exhibits the distribution of task arrivals during the first half 
hour and the second histogram presents the distribution of the 
second half hour. The first histogram has a peak in more right 
side than the second histogram because the cluster received 
higher task arrivals during the first half hour than the second 
half hour. 

This trend is similar to the Poisson distribution. The 
Poisson distribution has a high peak in the more right side 
when the rate parameter A is high. Therefore, the observed 
data will be fitted to the Poisson distribution by using MLE to 
obtain parameters of the Poisson distribution in every duration. 

C. Maximum Likelihood Estimation of distributions 

MLE is employed to obtain parameters of the Poisson 
distribution in every target period. The estimated Poisson 
distribution are achieved about the first half hour histogram 
by using MLE in Figure 5. The first graph shows the Poisson 
distribution with estimated parameters and the second graph 
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Fig. 5: MLE of the first hour 


is a histogram of task arrivals during the first half hour. We 
can observe the estimated Poisson distribution has a similar 
distribution with the histogram of data. The prediction model 
implements MLE in every 30 minute and saves them to the 
prediction dataset. 

D. Time Dependent Parameter Estimation 

Parameters are induced from the request data in every 30 
minute. Parameters generated at the same time point on the 
PP are stacked in the same column of the prediction dataset. 
Predicted parameters are achieved by implementing LLR about 
the corresponding UP. For example, if we implement LLR 
about the UP including the first to the 10th TP to predict 
the duration corresponding to 10th TP, the last point value of 
LLR function becomes the prediction value of the 10th target 
period. Prediction values are changed depending on how to 
set the utilization periods and how much bandwidth we adopt 
for LLR. The bandwidth represents how many near data are 
included when we predict the function value of a certain point. 



time 

Fig. 6: Poisson distribution parameter A estimation of arrival 

tasks 


Figure 6 represents the prediction of Poisson distribution 
parameter A of arriving tasks during week. Since we set the 
TP to 30 minutes, we have 336 target periods during the week. 
Blue points represent parameter values of training dataset in 
each T P and green points are parameter values of test dataset 
in each TP. Solid lines represent parameter prediction values 
for different values of bandwidth. We set the PP to 25 hours 














































in Fig. 4, which means that prediction value is obtained base 
on the last 25th hours data. Parameter A is equivalent to mean 
number of arrivals during 30 minutes. We can observe that the 
graph has a regularly repeated pattern. It has seven high peaks 
in the graph, which means similar patterns repeated during the 
week. 
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Fig. 7: Poisson distribution parameter A estimation 
(a)CPU (b)Memory 


Prediction values of parameters are obtained about the CPU 
and memory requests as well in the same method. Figure 7 is 
the parameter prediction of the CPU and memory requests. We 
obtain prediction values in the same method with task arrivals. 

E. Error assessment 

In order to quantify an accuracy of the prediction, we 
measure Mean Absolute Percentage Error (MAPE) between 
the prediction data and the test dataset. MAPE expresses an 
error rate as a percentage. So we can compare the prediction 
accuracy of task arrivals, CPU requests, and memory requests 
with a normalized error rate value. 

MAPE= -V 1^-^' (11) 

The Pj is a predicted value of a target value, Tj. MAPE 
value is equal to zero when the prediction model is the perfect 
fit to the target value and increased when the prediction is not 
properly fit to target values. 

Figure 8 is MAPE measurement graph of the Poisson 
distribution parameter A. The parameter A has MAPE range 
between 0.3885 and 0.5194. If we consider the ideal state of 



Fig. 8: MAPE measurement of task arrivals prediction 


MAPE is zero, the prediction model has enough prediction 
accuracy. The prediction model could achieve a higher accu¬ 
racy with the longer U P, which is equivalent the window size 
because increasing the UP means employing more previous 
data for the prediction. However, the large UP requires more 
complexity of a computation and consumes more time. In other 
words, a proper selection of the UP is required to satisfy 
both of the prediction accuracy and the computation time. 
Choosing the best bandwidth is also an important issue in order 
to reduce the prediction error. Too small bandwidth causes very 
spiky estimates while large bandwidth leads over smoothing. 
If data values are spread widely, the smaller bandwidth will 
not acquire the higher prediction accuracy. 



utilization Period 

Fig. 9: MAPE measurement of task arrivals prediction 


We compare the prediction accuracy of cyclic window 
learning algorithm with Poisson based statistical algorithm that 
suggested in [9]. We selected the prediction values obtained 
with bandwidth 20 and compared them with the prediction 
result of Poisson based statistical algorithm in Figure 9. In 
the result, we could observe that the cyclic window learning 
algorithm presents slightly better prediction accuracy when we 
employed small UPs. However, the cyclic window learning 
algorithm could improve the performance up to 13.6% when 
we employ more UPs data. 

VI. Conclusions 

We propose a novel approach for the request prediction 
in Cloud systems. Instead of predicting an actual amount of 
requests, our prediction model estimates parameters of the 

































probability distribution during the given period. We accumu¬ 
late the historical data of the system and a cyclic window 
approach to utilize data with MLE and LLR. In the experiment 
with Google cluster-trace data, we could ensure advanced 
performance of the prediction algorithm. Our prediction model 
achieves the very low level of error rates in predicting the 
probability distribution parameters. 
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